abstract action
Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model
Fujii, Kentaro, Murata, Shingo
Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games
Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. These games typically fall into three categories based on the flow of control: strictly sequential (players alternate single actions), deterministic response (some actions trigger a fixed outcome), and unbounded reciprocal response (alternating counterplays are permitted). A less-explored but strategically rich structure is the bounded one-sided response, where a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs). We introduce a modified version of Monopoly Deal as a benchmark environment that isolates this dynamic, where a Rent action forces the opponent to choose payment assets. The gold-standard algorithm, Counterfactual Regret Minimization (CFR), converges on effective strategies without novel algorithmic extensions. A lightweight full-stack research platform unifies the environment, a parallelized CFR runtime, and a human-playable web interface. The trained CFR agent and source code are available at https://monopolydeal.ai.
- Information Technology (1.00)
- Leisure & Entertainment > Games > Computer Games (0.93)
Multi-layer Abstraction for Nested Generation of Options (MANGO) in Hierarchical Reinforcement Learning
Arcudi, Alessio, Sartor, Davide, Sinigaglia, Alberto, François-Lavet, Vincent, Susto, Gian Antonio
This paper introduces MANGO (Multilayer Abstraction for Nested Generation of Options), a novel hierarchical reinforcement learning framework designed to address the challenges of long-term sparse reward environments. MANGO decomposes complex tasks into multiple layers of abstraction, where each layer defines an abstract state space and employs options to modularize trajectories into macro-actions. These options are nested across layers, allowing for efficient reuse of learned movements and improved sample efficiency. The framework introduces intra-layer policies that guide the agent's transitions within the abstract state space, and task actions that integrate task-specific components such as reward functions. Experiments conducted in procedurally-generated grid environments demonstrate substantial improvements in both sample efficiency and generalization capabilities compared to standard RL methods. MANGO also enhances interpretability by making the agent's decision-making process transparent across layers, which is particularly valuable in safety-critical and industrial applications. Future work will explore automated discovery of abstractions and abstract actions, adaptation to continuous or fuzzy environments, and more robust multi-layer training strategies.
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models
Dai, Hanzheng, Li, Yuanliang, Yan, Jun, Zhang, Zhibo
Automated penetration testing (AutoPT) powered by large language models (LLMs) has gained attention for its ability to automate ethical hacking processes and identify vulnerabilities in target systems by leveraging the inherent knowledge of LLMs. However, existing LLM-based AutoPT frameworks often underperform compared to human experts in challenging tasks for several reasons: the imbalanced knowledge used in LLM training, short-sightedness in the planning process, and hallucinations during command generation. Moreover, the trial-and-error nature of the PT process is constrained by existing frameworks lacking mechanisms to learn from previous failures, restricting adaptive improvement of PT strategies. To address these limitations, we propose a knowledge-informed, self-reflective PT framework powered by LLMs, called RefPentester. This AutoPT framework is designed to assist human operators in identifying the current stage of the PT process, selecting appropriate tactics and techniques for each stage, choosing suggested actions, providing step-by-step operational guidance, and reflecting on and learning from previous failed operations. We also modeled the PT process as a seven-state Stage Machine to integrate the proposed framework effectively. The evaluation shows that RefPentester can successfully reveal credentials on Hack The Box's Sau machine, outperforming the baseline GPT-4o model by 16.7%. Across PT stages, RefPentester also demonstrates superior success rates on PT stage transitions.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
LLM-Explorer: Towards Efficient and Affordable LLM-based Exploration for Mobile Apps
Zhao, Shanhui, Wen, Hao, Du, Wenjie, Liang, Cheng, Liu, Yunxin, Ye, Xiaozhou, Ouyang, Ye, Li, Yuanchun
Large language models (LLMs) have opened new opportunities for automated mobile app exploration, an important and challenging problem that used to suffer from the difficulty of generating meaningful UI interactions. However, existing LLM-based exploration approaches rely heavily on LLMs to generate actions in almost every step, leading to a huge cost of token fees and computational resources. We argue that such extensive usage of LLMs is neither necessary nor effective, since many actions during exploration do not require, or may even be biased by the abilities of LLMs. Further, based on the insight that a precise and compact knowledge plays the central role for effective exploration, we introduce LLM-Explorer, a new exploration agent designed for efficiency and affordability. LLM-Explorer uses LLMs primarily for maintaining the knowledge instead of generating actions, and knowledge is used to guide action generation in a LLM-less manner. Based on a comparison with 5 strong baselines on 20 typical apps, LLM-Explorer was able to achieve the fastest and highest coverage among all automated app explorers, with over 148x lower cost than the state-of-the-art LLM-based approach.
Exploring the limits of Hierarchical World Models in Reinforcement Learning
Schiewer, Robin, Subramoney, Anand, Wiskott, Laurenz
Hierarchical model-based reinforcement learning (HMBRL) aims to combine the benefits of better sample efficiency of model based reinforcement learning (MBRL) with the abstraction capability of hierarchical reinforcement learning (HRL) to solve complex tasks efficiently. While HMBRL has great potential, it still lacks wide adoption. In this work we describe a novel HMBRL framework and evaluate it thoroughly. To complement the multi-layered decision making idiom characteristic for HRL, we construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction. These models are used to train a stack of agents that communicate in a top-down manner by proposing goals to their subordinate agents. A significant focus of this study is the exploration of a static and environment agnostic temporal abstraction, which allows concurrent training of models and agents throughout the hierarchy. Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions. Although our HMBRL approach did not outperform traditional methods in terms of final episode returns, it successfully facilitated decision making across two levels of abstraction using compact, low dimensional abstract actions. A central challenge in enhancing our method's performance, as uncovered through comprehensive experimentation, is model exploitation on the abstract level of our world model stack. We provide an in depth examination of this issue, discussing its implications for the field and suggesting directions for future research to overcome this challenge. By sharing these findings, we aim to contribute to the broader discourse on refining HMBRL methodologies and to assist in the development of more effective autonomous learning systems for complex decision-making environments.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Learning Planning Abstractions from Language
Liu, Weiyu, Chen, Geng, Hsu, Joy, Mao, Jiayuan, Wu, Jiajun
This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.
- Workflow (0.68)
- Research Report (0.64)
Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning
Sung, Yoonchang, Shome, Rahul, Stone, Peter
This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem--which, given a task plan, finds valid assignments of variables corresponding to solution trajectories--as a hybrid constraint satisfaction problem. The proposed algorithm follows several design principles that yield the following features: (1) efficient solution finding due to sequential heuristics and implicit time and roadmap representations, and (2) maximized feasible solution space obtained by introducing minimally necessary coordination-induced constraints and not relying on prevalent simplifications that exist in the literature. The evaluation results demonstrate the planning efficiency of the proposed algorithm, outperforming the synchronous approach in terms of makespan.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)
Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts
Chiu, Po-Hsiang, Huber, Manfred
Learning a control policy capable of adapting to time-varying and potentially evolving system dynamics has been a great challenge to the mainstream reinforcement learning (RL). Mainly, the ever-changing system properties would continuously affect how the RL agent interacts with the state space through its actions, which effectively (re-)introduces concept drifts to the underlying policy learning process. We postulated that higher adaptability for the control policy can be achieved by characterizing and representing actions with extra "degrees of freedom" and thereby, with greater flexibility, adjusts to variations from the action's "behavioral" outcomes, including how these actions get carried out in real time and the shift in the action set itself. This paper proposes a Bayesian-flavored generalized RL framework by first establishing the notion of parametric action model to better cope with uncertainty and fluid action behaviors, followed by introducing the notion of reinforcement field as a physics-inspired construct established through "polarized experience particles" maintained in the RL agent's working memory. These particles effectively encode the agent's dynamic learning experience that evolves over time in a self-organizing way. Using the reinforcement field as a substrate, we will further generalize the policy search to incorporate high-level decision concepts by viewing the past memory as an implicit graph structure, in which the memory instances, or particles, are interconnected with their degrees of associability/similarity defined and quantified such that the "associative memory" principle can be consistently applied to establish and augment the learning agent's evolving world model.
- North America > United States > Texas > Tarrant County > Arlington (0.14)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
- Workflow (0.94)
- Research Report (0.81)
- Education (1.00)
- Health & Medicine (0.88)
Using Deep Learning to Bootstrap Abstractions for Hierarchical Robot Planning
Shah, Naman, Srivastava, Siddharth
This paper addresses the problem of learning abstractions that boost robot planning performance while providing strong guarantees of reliability. Although state-of-the-art hierarchical robot planning algorithms allow robots to efficiently compute long-horizon motion plans for achieving user desired tasks, these methods typically rely upon environment-dependent state and action abstractions that need to be hand-designed by experts. We present a new approach for bootstrapping the entire hierarchical planning process. This allows us to compute abstract states and actions for new environments automatically using the critical regions predicted by a deep neural network with an auto-generated robot-specific architecture. We show that the learned abstractions can be used with a novel multi-source bi-directional hierarchical robot planning algorithm that is sound and probabilistically complete. An extensive empirical evaluation on twenty different settings using holonomic and non-holonomic robots shows that (a) our learned abstractions provide the information necessary for efficient multi-source hierarchical planning; and that (b) this approach of learning, abstractions, and planning outperforms state-of-the-art baselines by nearly a factor of ten in terms of planning time on test environments not seen during training.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Arizona (0.04)